Level of interest sensing in spoken dialog using multi-level fusion of acoustic and lexical evidence
نویسندگان
چکیده
Accurately sensing a user’s interest in spoken dialog plays a significant role in many applications, such as tutoring systems and customer service systems. In addition to the widely used acoustic evidence, we introduce different lexical features for interest level prediction and evaluate the impact of automatic speech recognition (ASR) on the effectiveness of lexical information. In order to capture contextual information, we combine the system’s hypothesis for the previous turn with the current one. Our final system uses a multi-level fusion method for this task. Each fusion step uses different information such as acoustic and lexical cues, contextual information, or hypotheses from different classifiers. Our experiments show that various combinations improve system performance. In particular, we found that even though the word error rate is quite high, there is still performance gain by incorporating lexical information obtained from ASR output.
منابع مشابه
Detecting Levels of Interest from Spoken Dialog with Multistream Prediction Feedback and Similarity Based Hierarchical Fusion Learning
Detecting levels of interest from speakers is a new problem in Spoken Dialog Understanding with significant impact on real world business applications. Previous work has focused on the analysis of traditional acoustic signals and shallow lexical features. In this paper, we present a novel hierarchical fusion learning model that takes feedback from previous multistream predictions of prominent s...
متن کاملData Fusion and Multi-Criteria Decision Making for Producing Oil and Gas Resources Potential Maps (Case Study: Saracheh Zone, Qom Province)
This paper focuses on the application of Geoinformatic methods (simultaneous using of remote sensing, geographic information system, global positioning system, terrestrial and aerial photogrammetry) in optimal operation and exploration risk reduction of oil and gas reservoirs. To approach the purpose, two aspects of remote sensing (satellite image) and terrestrial and aerial photogrammetry have...
متن کاملDetecting emotional state of a child in a conversational computer game
The automatic recognition of user’s communicative style within a spoken dialog system framework, including the affective aspects, has received increased attention in the past few years. For dialog systems, it is important to know not only what was said but also how something was communicated, so that the system can engage the user in a richer and more natural interaction. This paper addresses t...
متن کاملObject Level Strategy for Spectral Quality Assessment of High Resolution Pan-sharpen Images
Panchromatic and multi-spectral images produced by the remote sensing satellites are fused together to provide a multi-spectral image with a high spatial resolution at the same time. The spectral quality of the fused images is very important because the quality of a large number of remote sensing products depends on it. Due to the importance of the spectral quality of the fused images, its eval...
متن کاملConfirmation detection in human-agent interaction using non-lexical speech cues
Even if only the acoustic channel is considered, human communication is highly multi-modal. Non-lexical cues provide a variety of information such as emotion or agreement. The ability to process such cues is highly relevant for spoken dialog systems, especially in assistance systems. In this paper, we focus on the recognition of non-lexical confirmations such as ”mhm”, as they enhance the syste...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010